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Outlines of Statistical Techniques, 
Applications, and Programs for 
Industry, Engineering and Science 



This manual outlines nine statistical techniques , giving simple 
definitions and examples, a summary of input and output, and 
references to numerous applications and computer programs. 
The techniques covered are: correlation, factor analysis, 
cluster analysis, regression, discriminant analysis, 
experimental analysis, evolutionary operation, Bayes 
formula, and time series analysis. 
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INTRODUCTION 



We need the quantitative methods of statistics both to clarify and to solve 
problems in every science, engineering discipline, and industrial 
enterprise. 

This manual spotlights nine statistical techniques. They are general- 
purpose tools and attack problems in which many variables or factors 
operate simultaneously. They thrive where data are highly variable and 
where no neat, determinate mathematical model is known. Widely di- 
vergent groups — behavioral scientists, electrical engineers, steel 
manufacturers — use these techniques. 

Computers handle statistical techniques with great speed and accuracy. 
Computers process many measurements on a great number of factors; 
allow easy experimentation and analysis. 

We cover the nine techniques in a short, simple manner. A thorough 
familiarity with statistics is not required to read this manual. The 
essential feature of a technique, the wide applicability, the perspective 
— these are made plain without cautions, hedgings, assumptions, or 
mathematical precision. Statistics is not taught here, or computer 
programming, or the subject matter implied in the illustrations (whether 
in geology, aeronautical engineering, or petroleum refining). However, 
for each technique area this manual: 

• gives a simple definition 



illustrates its application 

tells the type of data you begin with (inputs) and 

what answers you end with (outputs) 

references numerous applications 

lists some available computer programs. 



Remember in each of the nine sections to follow that the statistical tech- 
nique is only loosely defined. Generally, a technique has to satisfy many 
conditions to be validly used, and is not infallible in effect. However, 
some techniques are now controlling huge industrial operations. Others 
are providing researchers new insights into what were once complicated 
or puzzling situations . 

Three kinds of listings accompany the majority of the techniques: 

• a file of applications with references 

• a list of computer programs in the "IBM Catalog 
of Programs" series 

• a list of references which cite the use of IBM 
Systems in carrying out a technique. 

The first item above covers applications and examples, some tutorial, 
some in actual operation. 



The second item cites computer programs for the techniques covered, 
as well as for related methods. Each computer system has a program 
library contributed to by IBM customers or IBM personnel. The so-called 
Type I and Type II programs are supported by documentation and test 
procedures. IBM serves solely as the distribution agent for Type in and 
Type IV programs. More details on each program referenced can be 
found in the "IBM Catalog of Programs" appropriate to a machine (avail- 
able through a local IBM Branch Office). There are, of course, hundreds 
of statistical programs in use other than those in the IBM Catalog. 

The third item provides a lead to other possible sources of programs. 

References are to be found in alphabetic order in the back of this manual. 
A citation is often made to a later source rather than to the original one. 
At times a judgment about the use of a technique is based only on a title 
or an abstract. "Computer Abstracts" and "The H. W. Wilson Company 
Indexes" proved useful in gathering raw data on the use of a computer 
or a technique. 

Readers are encouraged to recommend other techniques, and contribute 
new citations to applications or programs. Write: Technical Publications, 
IBM Corporation, Data Processing Division, 112 East Post Road, White 
Plains, New York, 10601. We welcome comments and criticisms, too. 



CORRELATION 

in essence Correlation analysis measures the strength of relationship between two 

or more variables. 

example in Thirteen kinds of tests on each of a great number of rolls of paper 

paper-making reduce to four tests because of correlation analysis. Ten of the tests 

were so highly correlated (when one had high values, the others did too; 
when one had low values, so did the others) that nine of the original 
tests could be dispensed with. (Ref . Draper, p. 315) 

example in Correlations among 184 nutritional, metabolic, biochemical, and 

nutrition physiological variables (vitamin intake, mineral intake, amounts ab- 

sorbed, characteristics of subjects, etc.) revealed important inter- 
relations. Significant insights came from observing how the correlations 
changed as the experiment progressed in time. (Ref. James) 

summary From repeated measurements on a number of variables, correlation 

analysis measures the strength of association between every pair of 
variables. Degree of association runs from +1. (perfect), down 
through 0.0 (no correlation), to -1.0 (perfect inverse association). You 
search for high correlations (actually, either high direct or high inverse) 
in the paper-making example, to cut out superfluous tests. In the 
nutrition example, high correlations point to possible cause and effect 
relations (although high correlation need not, and usually does not, 
imply cause and effect situations). Many varieties of correlation mea- 
sures exist. 
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CORRELATION 
ANALYSIS 



VI V2 


V3 


V4 


VI 1.0 .9 


.1 


.2 


V2 _ 1.0 


.7 


.1 


V3 _ _ 


1.0 
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V4 _ _ 


- 


1.0 



Input: 

repeated measures on a set of vari- 
ables VI, V2, V3, V4. These four 
variables could be measures of height, 
weight, pulse, and blood pressure on 
individuals Ml, M2, . . . 



Output: 

correlations, a high 0.9 for VI and 
V2; a low 0. 1 for V2 and V4; a 
high inverse -0. 8 for V3 and V4. 
Now highly correlated variables can 
be collected and/ or some eliminated. 
Searches can be made for possible 
causes and effects, when appropriate. 



applications 



A few applications of correlation analysis: 



Subject 



Reference 



earth sciences 
psychotropic problems 
paper testing 
nutrition, metabolism 
geochronology 
educational measurements 
heart pathology 



Miller, ch. 13 

Hall 

Draper, p. 315 

James 

Martin 

Cooley, p. 21 

Tolles 



catalog of Programs available from "IBM Catalog of Programs" (Ref . IBM) in 

programs correlation analysis and related areas: 

Program 
Subject Computer Form Number 



correlation 


7070 


7070-11.3.005, 


.008 




correlation 










programs 


1620 


1620-06.0.013, 


.015, 


.021 






.022, 


.038, 


.039 






.040, 


.051, 


.064 






.097, 


.104, 


.114 






.121, 


.125, 


.162 






.170, 


.175, 


.188 






.205 






correlation 


1401 


1401-06.0.005, 


.006 




correlation 


360 
1130 


360A-CM03X 
1130-CM02X 







use of IBM Citations in trade journals, periodicals and texts on the use of IBM 

systems systems in carrying out correlation analysis: 



Subject 


Computer 


correlation pairing 


709/0 


tetraehoric correlation 


709/0 


correlation, health of aviators 


1620 


correlation 


709 


biserial correlation 


709/0 


BIMED programs (14) 


7090 


nutritional relationships 


1620 


geochronology 


7094 



Reference 

Priest 

Castellan 

Osborne 

Sorenson 

Castellan (2) 

Massey 

James 

Martin 



FACTOR ANALYSIS 



in essence 



example in 
advertising 



example in 
credit 



example in 
psychology 



summary 



Factor analysis finds new, more fundamental quantities (the factors) 
underlying measured variables. 

In an advertising effectiveness study, 20 variables (size of ad, colors, 
type sizes, copy blocks, product facts, benefits, pictures, readership, 
etc. ) bunch into 6 groups or factors. One factor related to the pic- 
torial and color variables, a second to ad size variables, a third to 
typography variables, a fourth to information variables, etc. (Ref. 
Ferber, p. 101) 

A factor analysis of 22 questions on a credit request revealed 6 basic 
factors: two connected with questions on the transaction, and four 
related to questions on personal history. (Ref. Myers) 

Some of the 48 scores (variables) on a Rorschach test were too much 
dependent on the number of responses given. Factor analysis into one 
general factor (interpreted as productivity) and 15 other factors, inde- 
pendent of the general factor, removed the difficulty. (Ref. Cooley , 
p. 164) 

With repeated measurements on a set of variables, factor analysis 
discerns underlying factors distinct from, and fewer in number than, 
the original variables. Some factor analytic methods seek out a general 
factor present in all original variables (the Rorschach example). In 
other methods , original variables are grouped so as to depend on a few 
distinct factors (advertising and credit examples) . 

Formulas which show each variable as a combination of the factors are 
important outputs of factor analysis. For example, in the advertising 
illustrations above, the variable representing "number of words" equals 
0.05 F1+ 0.46 F2 - 0.10 F3+ 0.71 F4 + 0.00 F5 - 0.03 F6. 
The 6 variables, which like this one had strong contribution from F4, 
all seem to concern the "information factor" in the ads. 
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FACTOR ANALYSIS 



LOADINGS ON 


FACTORS 


Fl AND F2: 




VI = 


.68 Fl + 


.I8F2 


V2= 


.92 Fl + 


.06F2 


V3= 


.02 Fl + 


.80F2 


V4= 


.11 Fl + 


.76 F2 



Input: 

repeated measurements of variables 
VI, V2, V3, V4,(e.g., size, width, 
color, words tallied for a series of 
ads Ml, M2, etc. ). 



Output: 

formulas for four variables in terms 
of fewer factors. VI and V2 are 
based mainly on Fl; V3 and V4 on 
F2. You now have fewer factors to 
deal with. Hopefully they are more 
meaningful than the original four 
variables. 



applications A few applications of factor analysis: 



Subject 



Reference 



metallurgy, steels, blastfurnace 

psychology, multiple time series 

metropolitan economy 

industrial relations 

consumer attitudes 

psychology 

physical fitness tests 

psychological data 

tests on paper products 

refinery process 

botanical applications 

language ability 

listening tests 

retail credit 

census data 

earth sciences 

Rorschach test (48 dimensions) 



Spurrell 

Anderson 

Carleton 

Boehr 

Adams 

Overall 

Falls 

Schbnemann 

Draper, p. 316 

Thomas 

Pearce 

Weaver 

Bateman 

Myers 

Massey, W.F. 

Miller, ch. 13 

Cooley, p. 164 



catalog of Programs available from "IBM Catalog of Programs" (Ref . IBM) in 

programs factor analysis and related areas: 

Program 
Subject Computer Form Number 



factor analysis 
factor analysis 

factor analysis, 
varimax 



7070 
1620 

360 
1130 



7070-11.3.005, .008 
1620-06.0.053, .091, .094, 
.103, .145, .169 
360A-CM 03X 
1130- CM 02X 



use of IBM Citations in trade journals, periodicals, and texts of the use of IBM 

systems systems in carrying out factor analysis: 



Subject 

factor analysis, oblique 
multivariate statistics 
nonlinear factor analysis 
factor analysis, rotation 
principal axes 
principal components 
BIMD programs (14) 
convulsive disorders 
principal components 
factor analysis, square root 
principal components 
factor analysis (3 mode) 
principal axes 
psychological data 
language ability 



Computer 


Reference 


7094 


Hendrickson 


7090/4 


Jones 


7090/4 


McDonald 


7090 


Wolf 


650 


Burket 


7070 


Bendig 


7090 


Massey, F. 


704 


Rodin 


7090 


Steidler 


7090 


Lingoes 


1620 


Moore, D.W 


709 


Walsh 


709 


Burket 


7094 


Schbnemann 


709 


Weaver 



CLUSTER ANALYSIS 

in essence Cluster analysis groups items or individuals by means of their 

characteristics. 

example in Cluster methods applied to bacteria have grouped them into their proper 

bacteriology orders, or grouped strains within a species. Classification of an 
atypical plague bacillus later proved accurate. (Ref . Sokal, p. 259) 

example in Attitudes toward mathematics were thought to have a significant relation 

education to general personality variables. In the study which confirmed this 

hypothesis, 42 personality variables (dominance, sociability, tolerance, 
etc.) clustered as follows. Variables 1, 2, 3, 7, 9, 10 constituted a 
"extroversion" group. Variables 4, 5, 6, 7, 8, 10, a "conscientiousness" 
cluster. Variables 5, 7, 11, 14, a "self-control" cluster, etc. 
(Ref. Aiken) 

summary From characteristics of individuals in a large, unorganized collection, 

cluster analysis groups similar individuals together. Among the many 
variants of cluster methods, certain ones place the clusters into 
hierarchies according to degree of similarity (the bacteriology reference 
above). The technique also can cluster attributes (variables) and thus is 
similar in purpose to factor analysis. 
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CLUSTER 
ANALYSIS 



I. i F x * f 

. U Ll3> 

-P — I "} 



f SIMILARITY 

"El*} CLUSTER 1 

"> CLUSTER 2 
CLUSTER 3 

CLUSTER 4 



Input: 

four characteristics CI to C4 mea- 
sured on nine items. For example, 
length, width, wing span, and 
antenna length are measured on 
nine insects. 



Output: 

at about . 6 on a degree of similarity 
scale, four clusters emerge. Cluster 1 
contains items 12, 17, 16. Cluster 2 
has only 13, etc. At . 2 on the simi- 
larity scale, only two clusters emerge: 
cluster 4 and a coalescence of clusters 
1, 2, 3. You might be searching for 
a genus at similarity =. 2; or for a species 
at similarity =. 6; or strains at 
similarity =. 9. 



applications A few applications of cluster analysis: 



Subject 



Reference 



botany (rice, manioc, farinosae) 

paleontology (fish) 

bateria (soil, plage, actinomycetes) 

viruses, genes, protein patterns 

plant ecology, soil classification plankton 

physical anthropology 

medical diagnosis 

rock identification 

oil exploration 

legislative voting patterns 

language translation 

pattern recognition 

archeology 

library, document classification 

philology, authorship tests 

leukemia classification 

information retrieval 

zoology (bees, mosquitos, man, cats, sponges) 

earth sciences 

math and personality 



Sokal 

Sokal 

Sokal 

Sokal 

Sokal 

Sokal 

Sokal 

Sokal 

Sokal 

Sokal 

Sokal 

Sokal 

Sokal 

Sokal 

Sokal 

Whitfield 

Sokal 

Sokal 

Miller 

Aiken 



catalog of Programs available from "IBM Catalog of Programs" (Ref . IBM) in 

programs cluster analysis and related areas: 

Program 
Subject Computer Form Number 



optimal clustering 
continuous variables 
binary variables 



nonnumeric 



7092 7092-G2 IBM0026 

7090 7090-ZO IBM0015 

7090 7090-ZO IBM0002 

1620 1620-06.0.201 



use of IBM Citations in trade journals, periodicals and texts of the use of IBM 

systems systems in carrying out cluster analysis: 



Subject 

taxonomic classification 
disease classification 
cluster analysis 
pattern recognition 



Computer 


Reference 


7090 


Lingoes (4) 


7090 


Bonner (2) 


1620 


Hyvarinen 


7090 


Bonner 



REGRESSION 

in essence 

examples in 
metallurgy 



Regression analysis computes prediction formulas from data. 

Regression related the Charpy fracture temperature to 11 variables 
(percent carbon C, manganese Mn, phosphorus P, etc. ) by the formula 

(311.7)C + (133.6)Mn+ (1194. 7)P + (57.5)Si - (26.4)Ni+ 

(Ref. Pehlke, p. 58) 



example in Regression found equations to predict hospital beds needed in various 

hospitals case categories. For example, neurology cases per month = 6. 65 + 

(0. 635)X17 + (3.47)X18 where variables X17 is Indiana deaths and X18 
Indiana births. One hundred seventeen variables (births, accident rates, 
number of specialists, etc.) were screened. (Ref. Beenhakker) 

example in The method of regression found the explicit formula for milk output 

agriculture as a function of XI = concentrate feed, X2 = hand feed, X3 = grassland 

acre days, and X4 = average cows in herd. The decimal exponents on 
each of the variable XI thru X4 can be handled by using logarithms. 

milk output = 39.80(X1) - 24 (X2) ' 15 (X3) ' 02 (X4) ' 05 

(Ref. Cowling) 

summary Starting with data on several variables and an idea of the general 

formula connecting the variables, regression analysis finds the specific 
numerical formula relating the variables. Statements can usually be 
made about how confident you are in the formula and in predictions using 
it. For example, in the agriculture example above, about 75 percent of 
the variation in the data is explained through the formula for milk output. 
Reduction of masses of data to simple equations makes regression im- 
portant. Two other advantages are understanding the nature of a process 
and prediction of future events. 



VI V2 V3 
Dl _ _ 
D2 _ 
D3 

FORMULA:V3=A+B(VI)+C(V2) 



REGRESSION 
ANALYSIS 



FORMULA: V3=0.1+1.2(V1) 
+0.6(V2) 
R 2 = 0.76 
F-VALUE = 4.06 
STANDARD ERROR B= 0.1 
STANDARD ERROR C=0.02 



Inputs: 

data on three variables VI, V2, V3. 
Also, the general form of one in 
terms of others. 



Outputs: 

the numeric coefficients in the 
regression formula: A=0. 1, B=l. 2, 
C=0. 6. Also, certain quantities 
used to measure confidence in the 
formula or in the coefficients. 
R2=0. 76 means 76 percent of the 
variation in V3 is explained by the 
equation in VI and V2. The for- 
mula has summarized much data; 
it can be used to predict a new V3 
from VI andV2. 



applications A few applications of regression analysis: 



Subject 



Reference 



paper industry 

agriculture, milk production functions 

weather forecasting 

educational measurements 

meteorology 

bank debits as an indicator 

construction economics 

urban transportation 

U.S. import demand 

wartime production 

urban refuse collection 

open hearth production 

earth sciences 

gasoline production 

psychological tests, canonical correlation 

psychological scores 

forging, metallurgy, lattice parameters 

economic model of United Kingdom (simultaneous) 

inertial navigation, error sources 

aerodynamics, downwash 

dairy production 

X-ray fluorescence 

foundry steels 

hospital bed needs 



Moore, P.G. 

Cowling 

Glahn 

Cooley, p. 38 

Lund 

Carleton 

Dilbeck 

Kain 

Reimer 

Rapping 

Hirsch 

Leckie 

Miller, ch. 8, 

9, 17 
Ostle, p. 167, 

183 

Meredith 
Meredith (2) 
Pehlke 
Ball 
Eisner 
Fromme 
Jarrett 
Alley 
Sprinkle 
Beenhakker 



catalog of Programs available from "IBM Catalog of Programs" (Ref . IBM) in 

programs regression analysis and related areas: 

Program 
Subject Computer Form Number 



regression 


650 


0650-00.0.056 






regression 


705 


0705-11.3.001 






regression 


1410 


1410-11.3.001, 


.002 




regression 


7070 


7070-11.3.001, 


.007, 


.011 


regression 


1620 


1620-06.0.001, 


.003, 


.006, 






.031, 


.042, 


.049, 






.057, 


.066, 


.077, 






.084, 


.101, 


.118, 






.120, 


.122, 


.142, 






.143, 


.154, 


.157, 






.159, 


.168, 


.173, 






.181, 


.187 





10 



Subject 

nonlinear regression 
iterative least squares 
orthogonal polynomials 
systems of equations 
mortality curves 
nonlinear least squares 
stepwise regression 
polynomial fit 
regression 
stepwise regression 
differential equations 
constrained regression 
polynomial least squares 
regression 

regression subroutines 
(multiple, polynomial, 
canonical) 





Program 


Computer 


Form Number 


704 


0704-G2 3226N11 


709 


0709-E2 3024LSQ 


709 


0709-E2 3197CF 


709 


0709-F1 3090NORM 


7040 


7040-G1 3356MORT 


7040 


7040-G2 3094LIN 


7040 


7040-G2 3205RRG 


7090 


7090- E2 3289 PLYF 


7090 


7090- G2 3104RGNL 


7090 


7090-G2 3143MPR2, MPR3 


7090 


7090-G2 3146NLR 


7094 


7094-E2 3363BJ06 


7094 


7094-E2 3372AM26 


1401 


1401-06.0.002, .003, .004, 




.005, .007, .008 


360 


3 60 A- CM 03X 


1130 


1130-CM 02X 



use of IBM Citations in trade journals, periodicals, and texts of the use of IBM 

systems systems in carrying out regression analysis: 



Subject 

resonance spectra 
multivariate programs (30) 
regression prediction 
decay data 

simultaneous regression 
psychophysiology 
multiple regression 
heart disease 
tomato factors 
chemical analysis 
BIMD programs (14) 
generalized regression 
metallurgy 
aerospace naviation 
hospital bed prediction 



Computer 


Reference 


1410 


Nelson 


7090/4 


Jones 


7090 


Lingoes (2) 


0650 


Worsley 


7090 


Lingoes (3) 


1620 


Williams 


7090 


Steidler 


0650 


Ward 


7070 


Mittler 


0650 


Winchell 


7090 


Massey, F. 


0704 


Eisenpress 


0704 


Pehlke 


7090/4 


Eisner 


7090 


Beenhakker 



11 



DISCRIMINATION 



in essence 



Discriminant analysis assigns individuals to known groups. 



example in Four measurements, XI, X2, X3, X4, taken on each of many human 

anthropology and chimpanzee fossil teeth yielded a discriminant function. 

D = XI - (7. 49)X2 + (2. 34)X3 + (4. 70)X4 

with D averaging -5. ± 2.45 for human teeth and averaging +17. 6 ±2.45 
for chimpanzee teeth. The Taungs skull, with a D = -7.9, was subse- 
quently classified as probably human. (Ref. Keeping, p. 366) 

example in Three hundred words in each of three languages — English, Swedish, 

linguistics and Finnish — were treated by discriminant analysis. Quantitative 

measures (e.g. , number of A' s, of B r s, , of syllables, etc.) 

were taken on each word. A first discriminate function separates 
Finnish from the other two; a second function distinguishes English and 
Swedish. (Ref. Mustonen) 

summary Starting from known groups of individuals, each individual with measured 

characteristics, discriminant analysis derives the so-called discriminant 
functions. They allot the originally given individuals to their proper group 
and new individuals to an appropriate group. The functions can't 
always cleanly distinguish the groups as they did in the linguistics 
example. A misclassification percent measures the effectiveness of 
the discriminant functions in allocating the original individuals into 
their known groups. Cluster analysis differs from discriminant analysis 
in that cluster analysis discovers groups, whereas discriminant analysis 
begins with recognized groups. 



CI C2 C3 



GROUP! 



in - _ - 

112 __ _ 



13 _ 

GROUP 2tf 4 _ 
15 _ 



DISCRIMINANT 
ANALYSIS 



THE DISCRIMINANT 
FUNCTION FOUND IS 
F1=2.6 + I.I(CI) + 

0.9(C2)-7.0(C3) 
WITH AVERAGE VALUE 
FOR GROUP 1 = 18.2 ±.9 
GR0UP2=-1.3± .9 



Inputs: 

the five individuals II to 15 fall into 
two known groups. Characteristics 
CI, C2, C3 are measured on each 
individual. 



Outputs: 

the discriminant function serves to 
classify a new individual into his ap- 
propriate group on the basis of his 
characteristics. 



12 



applications A few applications of discriminant analysis: 



Subject 



Reference 



judgments in cooking 

brand loyally 

coal analysis 

earth sciences 

educational methods 

weather prediction 

textile research 

medicine 

soil differentiation 

linguistic problems 

biological taxonomy 

U. S. National Health Survey 

physical anthropology 

meteorlogy 

basalic lava discrimination 



Baten 

Farley 

Baten (2) 

Miller, ch. 12 

Baten (3) 

Glahn, p. 122 

Baten (4) 

Radhakrishna 

Cox 

Mustonen 

Burnaby 

Fisher 

Ashton 

Lund 

Chayes 



catalog of Programs available from "IBM Catalog of Programs" (Ref. IBM) in 

programs discriminant analysis and related areas: 

Program 
Subject Computer Form Number 



classification 
discrimination 
discrimination routines 



1620 1620-06.0.076, .201, .208 

7094 7094-BMD 04M, 05M 

360 360A-CM 03X 

1130 1130- CM 02X 



use of IBM Citations in trade journals, periodicals and texts of the use of IBM 

systems systems in carrying out discrimination analysis: 



Subject 


Computer 


Referene 


classification 


7094 


Kossack 


multivariate statistics 


7090/4 


Jones 


health survey 


704 


Fisher 
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EXPERIMENTAL DESIGN AND ANALYSIS 

in essence Experimental design and analysis methods decide whether various 

factors or combination of factors influence a result. 

example in The impact extrusion of aluminum turned out to be influenced by 7 

aluminum alloy variations, 4 process variables, and 3 annealing temperatures. 

(Ref. Pehlke) 

example in The physical properties of a semivitreous body were shown by experi- 

ceramics mental design to depend on such factors as particle size, water, 

entrapped air, thickness, firing rate. The water, air, and thickness 
variables appeared linear; particle size was nonlinear. Combinations 
between levels of particle size and other factors caused important 
changes. (Ref. Conrad) 

summary One starts with measured outcomes for various combinations of 

influencing factors (many at preselected values, some simple observed). 
Experimental design and analysis techniques sort out the important 
factors or combination of factors producing the outcome. A host of 
"designs" are used. Randomization and repetition of runs insure con- 
fidence in results. 



F1 -5 X 

F2 ,70' 80. 
F3 1 2 
D = = 



1 



70' ,80, 
2 12 12 



MODELS MAIN EFFECT + 

F1 + F2+F3 + F1*F2 + E 



EXPERIMENTAL 
DESIGN AND ANALYSIS 



GRAND MEAN =67.2 


SOURCE OF 


DEGREES OF 


MEAN 


VARIATION 


FREEDOM 


SQUARES 


Fl 


1 


63.1 


F2 


1 


262.8 


F3 


1 


50.2 


F1*F2 


1 


85.1 


ERROR 


11 


102.9 


TOTAL 


15 





Inputs: 

three factors Fl, F2, F3, each with two 
levels (Fl at -5 and +5 for example) are 
examined for their effect on outcome D, 
both separately and in F1*F2 combina- 
tion. These separate contributions are 
over and above an average main effect 
and error, E. The outcome D is 
measured twice (replicated). 



Outputs: 

the so-called "mean squares" and 
"degrees of freedom" determine whether 
or not the factors or combinations really 
affect the outcome. Other models in- 
cluding new factors or triple combina- 
tions F1*F2*F3 can be tried. 
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applications A few applications of experimental design and analysis: 



Subject 



Reference 



testing rocket engines 

geology, paleontology, etc. 

electric machinery electrodes 

psychological experiments 

machinery metals 

psychotropic problems 

transistor industry 

aluminum impact extrusions 

castings 

textiles, cotton spinning 

general review 

steels, Ausforming process 

chemical tests of textiles 

paper pack seals 

soaps, R/D, manufacturing, marketing 

U.S. Patent Office, transistor circuits 

automobile purchasing 

ceramics 

hardening of steels 

etc. 



Wood 

Miller, ch. 7 

Hicks 

Chan 

Hamaker 

Hall 

Hamaker 

Pehlke 

Brownlee 

Peake 

Hunter 

Duckworth 

Bainbridge 

Moore, p. 307 

Michaels 

Bryant, p. 204 

Jung 

Conrad 

Hopkins, A. D. 



catalog of 


Programs available from 


"IBM Catalog c 


>f Programs" (Re 


C. IBM) 


in 


programs 


experimental design and related areas: 














Program 








Subject 


Computer 


Form Number 








variance analysis 


7070 


7070-11.5.002 








covariance analysis 


1620 


1620-06.0.023, 


.024, 


.025, 








.032, 


.080, 


.092, 








.107, 


.109, 


.129 




analysis of variance 


1620 


1620-06.0.026, 


.027, 


.028, 








.029, 


.030, 


.033, 








.041 


.043, 


.060, 








.061 


.062, 


.065, 








.069 


.070, 


.081, 








.083 


, .086, 


.087, 








.088 


, .089, 


.102, 








.105 


, .113, 


.123, 








.132 


, .139, 


.140, 








.152 


, .161, 


.174, 








.176 


, .202, 


.207, 








.210 


, .213 






variance, covariance 


7040 


7040- G2 3365 l 


\NOV 






analysis of variance 


7090 


7090- G4 3027 i 


\NAWZ 






factorial analysis 


7094 


7094-G4 3337 i 


\NV 






factorial analysis 


1401 


1401-06*0.012 


, .014 






factorial design 


360 
1130 


360A-CM-03X 
1130-CM-02X 
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use of IBM Citations in trade journals, periodicals, and texts of the use of IBM 

systems systems in carrying out experimental design and analysis: 



Subject 

variance, mean difference 
paired comparisons 
electroencephalogram , 

anova 
Latin squares 
incomplete factorial 
variance, covariance 
multivariate statistics 
analysis of variance 
analysis of variance 
BIMD programs 
factor analysis 
analysis of variance 
variance, covariance (7) 



Computer 


Reference 


7072 


Turk 


7090 


Gulliksen 


7090 


Sorkin 


7090 


Gilbert, E.N 


7094 


Webb 


7090 


Finn 


7090/4 


Jones 


7074 


Hemmerle 


7070 


Bendig 


7090 


Massey, F. 


1410 


Cientat 


709/90 


Hopkins 


1401 


Sterling 
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EVOLUTIONARY OPERATION 

in essence Evolutionary operation resets variables step by step to make a process 

better. 

example in A high-cost organic chemical process has yields depending on two 

chemicals important factors: reaction time and weight of additives. Starting 

from time = 75 minutes and weight = 120 pounds, evolutionary operation 
gradually improved yield from 562 to 588 pounds by adjusting time to 
25 minutes and weight to 110 pounds. (Ref . De Busk) 

example in In a catalytic cracking application four variables — feed rate, reactor 

petroleum temperature, recycle, and space velocity — were adjusted to maximize 

catalytic distillate yield and catalytic light gas oil yield. The levels 
and order of successive runs were imposed by operating personnel. 
(Ref. Klingel) 

summary Given a process whose yield (or some other response) is a function of 

a number of factors, evolutionary operation tries small changes in two 
or three controlling factors to improve the yield gradually. The slight, 
deliberate changes in process variables must cause some improved 
response over and above mere random fluctuation. 



PHASE 6 


CYCLE 10 


RESPONSE AT 


RESPONSE 


5 SETTINGS 


IMPROVEMENT 


Fl 


DUE TO: AMOUNT 


i 46 47 


Fl .6 ±.9 


\ / 


F6 1.9 ±.9 


45 
4 2 / N 3 


F1*F6 .1 ±.9 


MEAN .3 ±.8 


F6 







EVOLUTIONARY 
OPERATIONS 



PHASE 6 
RESPONSE AT 
5 SETTINGS 
Fl 
48 49 

\ / 
47 

44 45 

1 ■* ■ ' "w ro 



CYCLE 11 

RESPONSE 

IMPROVEMENT 

DUE TO: AMOUNT 

F 1 -.2 ±.6 

F6 .8 ±.6 

F1*F6 .1 ±.6 

MEAN .4 ±5 



Inputs: 



Outputs: 



a process has been operating during 
previous phases (with other settings for 
factors, or other factors, or another re- 
sponse variable). Factor F6 at its high 
level causes the response to exceed mere 
"noise" (±. 9). The process was then 
shifted permanently to this high value of 
F6 and evolutionary operations begun. 



response is measured at settings of Fl 
and F6, around the central operating 
point. Again the high setting of F6 im- 
proved the response. Fl did not affect 
the response} however, next cycle 
around, Fl's operating limits were 
spread wider to discover its influence. 
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applications A few applications of evolutionary operation: 

Subject Reference 



geological facies maps 

alloying process, quality control 

textiles, spinning machinery 

chemical process 

catalytic cracking 

response surfaces, general 

cutting-tool equation 

chemical process 

chemical plants 

quality in plastic processes 



Miller, p. 394 

Bingham 

Dudley 

De Busk 

Klingel 

Bradley 

Wu 

Kochler 

Box 

Harrington 
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BAYES FORMULA 



in essence 



Bayes formula finds probable causes from measured effects. 



example in Medical knowledge is available in the form of probabilities of a complex 

medicine of symptoms arising from a given disease complex. Also known are 

the frequencies of occurrence of the disease complexes. Bayesian 
techniques discover the most probable disease complex from a newly 
presented set of symptoms. (Ref. Ledley, Sect. 12-4) 

example in In sample and quality control there are subjective estimates for the 

quality control chances that different fractions of a lot are defective. On the basis 

of new information from samples, Bayesian analysis revises these 

subjective estimates. (Ref. Locke) 

summary Given the probabilities that certain effects follow certain causes (based 

on historical data); and given the frequency of occurrence (or probabil- 
ities) of the causes (based on historical or subjective knowledge). Use 
of Bayes formula finds the probability of a cause behind a newly pre- 
sented effect. Strictly speaking, Bayesian statistics or Bayesian 
decision theory is a quite wide field. Here we narrow attention to 
using Bayes formula to deal with what can be looked upon as causes (C) 
and effects (E). The probability, P (Ci|Ej), of a cause Ci given the 
presence of an effect Ej is 

P(CilEj)= P(Ci)P(Ej[Ci) 

Sum (P(Ci) P(Ej 



Ci)) 



over all 
diseases, i 



In the medical example, symptoms and diseases take the place of effects 
and causes. In the quality control example, sample information (effect) 
revises subjective estimates of the fraction (cause) of lot defectives. 



P(E0/C0K9 P(E1/C0K»» 
P(E0/C1)=.2 P(E1/C1K»» 
P(E0/C2)=.0 P(E1/C2K»« 



P(C0)=.9 P(C1) = .7.. ( 



BAYESIAN 
ANALYSIS 



P(C0/E0)=.2 P(C0/E1)=. # 
P(C1/E0)=.l P(C1/E1)= # . 
P(C2/E0)=.4 P(C2/E1)=, # 



Inputs: 

P(E1/C2) means the probability of 
effect, El given existence of cause, 
C2. These kinds of probabilities are 
known, together with probabilities of 
causes. P(C2) is that of cause, C2. 



Outputs: 

probabilities of causes from a knowl- 
edge of effects. There is a probability 
of . 4 that cause C2 underlies effect 
EO. Most probable causes for given 
effects are evident from a scan of a 
column of probabilities. 
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applications A few applications of Bayesian analysis: 



Subject 



Reference 



reliability, quality control 

sequential life testing 

quality control 

competitive bidding, equipment selection 

clinical trials 

statistical decision 

quality control 

marketing research costs 

medical diagnosis 

hospital engineering 

psychiatric classification 

control theory 

criminalistics 

military information processing 



Schafer 

Ginsburg 

Locke 

Peterson 

Novick 

Greyson 

Hamburg 

Bass 

Ledley, Sect. 12-4 

Aitchison 

Birnbaum 

Ho 

Kingston 

Kaplan 



20 



TIME SERIES ANALYSIS 

in essence Time series methods analyze, compare, and predict quantities which 

change in time. 

example in Ten years of monthly data on hog production reveal , through time series 

agriculture methods, a major 12- month cycle, a medium 6- month cycle, and a 

minor 4- month cycle. The model accounts for a shifting of peaks and 

troughs over the ten-year period. (Ref . Abel) 

example in Of 800 time series examined for their relationship to a general business 

economics cycle, 21 series were selected as statistical indicators. Eight led the 

general cycle, 5 lagged behind it, and 8 coincided with it. The predictive 
ability of the indicators was satisfactory. (Ref. Chou, P. 566) 

example in To discover underlying geologic structures in inaccessible or covered 

geophysics regions, aeromagnetic maps are taken. Autocorrelation and spectral 

analysis techniques discover dominant trends, faults, and lithologic 

periodicities. (Ref. Horton) 

summary With measurements over time on one or several quantities, time 

series techniques (1) smooth out the quantity (weighted averages); 
(2) extract important trend-cycle- seasonal ingredients (seasonal ad- 
justment); (3) discover pure cyclic ingredients (harmonic analysis); 
or (4) compare pairs of series (cross-correlation). Prediction is a 
major objective. The agriculture example above is a use of harmonic 
analysis. The other two examples relate to cross- and auto-correlation. 
The geophysics illustration substitutes variables as functions of distance 
for variables depending on time. 





Ql 


Q2 


Q3 


Q4 


Tl 


_ 


_ 


_ 


_ 


T2 


_ 


_ 


_ 


_ 


T3 


_ 


_ 


_ 


_ 


T4 


_ 


_ 


_ 




• 










• 










• 











TIME SERIES 
ANALYSIS 



1) MOVING AVERAGES 
ON Ql 

2)SEASONAL ADJUSTMENT 
ON Q2 

3) HARMONICS OF Q3 

4)CR0SS CORRELATIONS 
Ql Q4 



Inputs: 

one or several quantities, Ql, Q2, 
Q3, Q4, are known over time. A 
variety of approaches in time series 
allow prediction of the quantities at 
future times. 



Outputs: 

(1) 20 day moving averages on Ql 
might be 97. 8, 96. 2, 95.9, . . . 

(2) Q2 without seasonals might be 
15.8, 17,2, 18.3, 16.1, ... 

(3) the cycle size of Q3 might be 

. 2 for 7 day, . 5 for 14 day, . . 

(4) for lags of 0, 1, 2, ... the 
cross correlation of Ql and Q4 
might be . 98, . 80, . 67, . 20, . . 
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applications A few applications of time series: 



Subject 



Reference 



chemical process control, cum sum 

biological systems 

tracking (radar, sonar), exp. smoothing 

mobile radios reception, power spectra 

new-car sales forecasting 

stock prices, random walks 

consumer attitudes, regressions 

earth sciences 

electroencephalography, cross-correlation 

X-ray patterns, Fourier series 

hog production, harmonic analysis 

textiles, periodograms correlograms 

meteorology, power spectra 

inventory, production control, exponential 

economic cycles, spectral analysis 

marine profiles, spectral analysis 

labor force, employment, seasonal adjustment 

closing stock prices 

radar data 

aeromagnetic maps 



Truax 

Attinger 

Helms 

Gilbert, E. N. 

Dyckman 

Fama 

Adams 

Miller, Ch. 15 

Ledley, Sect. 10-2 

Pehlke 

Abel 

Foster 

Craddock 

Wagle 

Adelman 

Neidell 

Shiskin 

Brown 

Jenkins 

Horton 



catalog of Programs available from "IBM Catalog of Programs" (Ref. IBM) in 

programs time series analysis and related areas: 

Program 
Subject Computer Form Number 



seasonal adjustment 
autocovarianc e 
power spectrum 

seasonal adjustment 

business cycles 

econometric forecasts 

series decomposition 

seasonal adjustment 

time series 

time series routines 
(averaging exponential 
smoothing, Fourier 
analysis, auto- and 
cross-correlation) 



650 


0650-06.0.041 




7070 


7070-11.2.001, .002 




1620 


1620-06.0.005, .056, 


.126 




.133, .147, 


.166 


1620 


1620-06.0.054 




709 


0709- Gl 3103 BCA 




7090 


7090-GO 32419 FES 




7090 


7090-C1 3144 TSDA 




1401 


1401-CA 04X and -06. 


0.015 


7094 


7094-BMD 01S, 03S 




360 


3 60 A- CM 03X 




1130 


1130-CM 02X 
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use of IBM Citations in trade journals, periodicals, and texts of the use of IBM 

systems systems in carrying out time series analysis: 



Subject 

upper winds, smoothing 

autocorrelation 

geophysics 

economic time series 

power systems spectra 

filtering 

sales, exponential smooth 

X-ray patterns 



Computer 


Reference 


7090 


Reiter 


7090 


Schmid 


7090 


Simpson 


7090 


Karreman 


0650 


Cooke 


7094 


Whittlesey 


0704 


Whelan 


0709 


Pehlke 
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